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Preface 

Electronic journals represent a significant and growing part of the academic 
library's offerings. As demand for e-joumals increases, librarians are faced 
with a new set of decisions related to acquisitions and services. Must libraries 
retain both print and electronic copies? Is the price of the electronic copy jus- 
tified by its use? Do usage patterns show that some journals wiU be as heavily 
used — or more so — in 20 years as when they are published? Answers to these 
and other questions require statistics on usage, and in the electronic realm, 
such statistics must come from the publishers. 

Unfortunately, it has been difficult, if not impossible, for librarians to ob- 
tain meaningful usage data from publishers of electronic journals. The reason 
is not a simple matter of publishers being unwilling to provide such informa- 
tion, even though some complain that implementing a data collection func- 
tion is costly and others fear that librarians wiU cancel subscriptions if they 
learn that usage is low. A more basic problem is that there is no agreement on 
how to produce data that can be compared and analyzed. It has been exceed- 
ingly difficult for librarians to know what to ask for when something as basic 
as the term "use" can have many meanings. 

CLIR commissioned Judy Luther to review how and what statistics are 
currently collected and to identify the issues that must be resolved before li- 
brarians and publishers feel comfortable with the data and confident in using 
them. In her extensive interviews with librarians and publishers, the author 
found significant common ground on the types of concerns held. 

We hope that, in identifying some of the critical issues, this white paper 
will provide a basis for discussion among publishers, librarians, and aggrega- 
tors that will lead to effective cooperation in collecting and analyzing usage 
statistics. CLIR will continue to pursue the agendas that librarians and pub- 
lishers share as they make the transition to the digital environment and they 
find new ways of meeting users' needs. 



Deanna Marcum 
President 




7 



White Paper on Electronic Journal Usage Statistics 



1 



Introduction 



E xpenditures on journals comprise about 70 percent of the av- 
erage academic library's budget for materials. Research li- 
braries in the United States spend more than $500 million on 
journals annually and several large libraries estimate that they spend 
20 percent of their budgets on electronic materials. 

When scholarly journals are converted to electronic form, they 
are frequently offered as part of a database hosted by a publisher or 
an aggregator and made accessible through the Web. As users shift 
from using local print materials to using remote files, librarians seek 
to collect usage data that justify the library's investment in electronic 
resources. Because the library licenses, but does not own, the content, 
it must depend on the provider for usage data. 

Providing usage data is a new role for publishers and aggrega- 
tors — one that requires not only much learning but also a financial 
investment. While it appears that the data would be as useful to pub- 
lishers as to librarians, publishers must first develop the capability to 
serve their own purposes and then provide additional analyses and 
support to present the data so that librarians can use them. 

Less than half of the publishers who offer journals in electronic 
form today are able to provide statistics on the usage of these jour- 
nals. What is available varies widely among publishers, and librari- 
ans are often unclear about what to ask for and how they will use the 
data. Guidelines for compiling statistics are just em.erging and have 
not been widely adopted. 

Publishers are concerned that the data they share with librarians 
lack context. If, in the absence of such a context, usage data seem 
low, the publishers fear that librarians may use such information as a 
basis for canceling subscriptions. As both librarians and publishers 
become more familiar with the current state of usage statistics, the 
focus of the conversation will shift to what needs to be done to en- 
sure consistency and to provide a valid context for understanding 
the data. There have been rapid developments in this area in the six 
months since this study began, and the author is encouraged by re- 
cent discussions with publishers who were previously reluctant to 
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provide data to libraries and are now inquiring about what should 
be delivered. 

This paper provides a snapshot of developments in the industry. 
It identifies issues that concern both publishers and librarians and 
suggests a context for further discussion between the providers and 
consumers of electronic journals. 

Background 



Since libraries that host electronic journal content locally face the 
same challenges in collecting usage statistics as do publishers, the 
author chose to interview librarians at OhioLINK, Los Alamos Na- 
tional Labs, and the Florida Center for Library Automation to deter- 
mine how they provide such data to their consortia members. Pub- 
lishers and providers were then interviewed to compare their 
approaches to collecting and presenting the data with those of the 
libraries. Finally, comments from both groups were solicited to iden- 
tify their concerns and to establish a base for understanding and in- 
terpreting the data. Summaries of selected interviews are provided in 
Appendix A. 

To provide a frame of reference for this study, current initiatives 
by other organizations on developing data collection policies were 
reviewed. In addition to guidelines published by the International 
Coalition of Library Consortia (ICOLC), which were based on the 
JSTOR initiative, there is substantial work being done in this country 
by the Association of Research Libraries ( ARL), the National Infor- 
mation Standards Organization (NISO), and the National Commis- 
sion on Libraries and Information Science (NCLIS), and abroad by 
the European Commission. The ICOLC guidelines may be found in 
Appendix B; information on related industry initiatives appears in 
Appendix C. While some of the studies focus on defining data ele- 
ments and collecting statistics, others examine data in light of factors 
used to assess performance, such as the percentage of user popula- 
tion being served. 

Issues Affecting Librarians and Pubiishers 

Among the most important findings of this study is that librarians 
and publishers share a significant number of concerns about the de- 
velopment and interpretation of statistics. Both are seeking agree- 
ment on core data that are needed and are exploring an appropriate 
context for interpretation. Once publishers and providers discover 
how to produce comparable and reliable data, it will be possible to 
continue discussions about usage and value to the user. 



Issues of Common Concern to Librarians and 
Publishers 

All indicators of usage are steadily rising, in part because of the con- 
tinued growth of electronic content available on the desktop. How- 
ever, in the electronic world, there are more variables that affect the 
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analysis of statistics and an understanding of the results than there 
are in the print world. For a balanced picture, librarians and publish- 
ers will want to consider how the following variables affect their 
data, assessments, and conclusions. 

Lack of comparable data 

The issue of greatest concern to publishers and librarians is the lack 
of comparable data. Variations in definitions and implementation 
procedures make it impossible to compare data from different host 
environments with any degree of reliability. 

Unless data on multiple publishers are collected from the same 
platform (such as OhioLINK or Catchword) with common hardware 
and shared software, variations in how items are identified and 
counted will skew the results. What is counted (e.g., searches, ab- 
stracts displayed, HTML pages viewed, PDF documents download- 
ed) and how (whether internal use, such as demonstrations, and ex- 
ternal use, such as spider hits or "robot contamination," are 
excluded) will vary according to the software used. 

Librarians currently receive reports with different data elements 
that are not clearly defined and that cover different time periods, 
making it impossible to analyze them in a consistent way. Publishers 
likewise find it difficult to reconcile internal data that are produced 
from different systems that count data in different ways. 

Lack of context 

With insufficient data from the print environment and insufficient 
experience in the rapidly changing electronic environment, it is not 
possible to establish a context for understanding data available on 
the current level of online activity. What little data librarians have on 
the use of print cannot serve as a basis for projections on the use of 
electronic journals. 

Current measures are limited to data on the amount of activity, 
such as the number of downloads. To base comparisons on the use of 
large or very popular journals (such as Nature, Science, or Cell) sets an 
artificially high benchmark for other titles with fewer articles avail- 
able for use. This raises the question of whether the measure of activ- 
ity should be relative to another factor, such as the price of the jour- 
nal or the number of available articles, which puts the measure in a 
context. 

Both publishers and librarians emphasize that measures of the 
level of activity do not indicate the value of an article. It is dangerous 
to assume that a popular title that is used by many students is worth 
more than a research title that is used by only a few faculty members 
working in a specific discipline. Other factors need to be considered. 

Known differences in information-seeking behavior among users 
in various scientific disciplines warrant additional study to identify 
usage patterns. As more data are examined on use and behavior, it 
may be possible to establish average levels of use for different subject 
areas or user groups. 
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Incomplete usage data 

Constructing a complete picture of use is further complicated by the 
existence of journals in multiple formats that are available through 
multiple sources or distribution channels, e.g., directly from the pub- 
lisher, onsite at a library, or through a vendor such as OCLC. This 
means publishers must combine usage data for journals that are 
mounted on a remote host, such as OCLC or OhioLINK, with data 
for journals kept on their own Web sites. Libraries are confronted 
with multiple sources of usage data, or the lack thereof, for different 
formats (print, electronic, microfilm) and for multiple copies of titles 
that are available from several sources. 

Marketing 

Publishers who make usage data available are aware that this infor- 
mation will be used to assess the value of their journals. Consequent- 
ly, they want to ensure that usage is high so that the cost per-use is 
low compared with that of other publications. Publishers and librari- 
ans with experience in electronic databases agree that marketing to 
users — whether librarians or library patrons — and making them 
aware of the availability of the resource and its features have a no- 
ticeable impact on usage. 

It can take from sixteen months to three years for users to inte- 
grate into their routines changes in how they access information. For 
that reason, the amount of time a database has been available influ- 
ences usage rates (Townley and Murray 1999). Elsevier's experience 
with The University Licensing Program (TULIP) and Pricing Elec- 
tronic Access for Knowledge (PEAK) taught publishers that it is es- 
sential to promote the availability of a journal database to users and 
to allow time for user behavior to change. 

Both librarians and publishers involved in the PEAK project ac- 
knowledged that publicity and promotion made a difference in lev- 
els of use. At Vanderbilt University, the medical school's use of the 
electronic journals was disproportionately low because the medical 
library was reluctant to publicize the use of a system that its staff 
considered to be temporary (Haar 2000). 

Content provided 

The demand for specific electronic titles is affected by both the time- 
liness of content and the amount of content provided. Some publish- 
ers release articles in electronic form before publishing the print ver- 
sion or choose to delay the electronic version for a few issues or for a 
year so as not to affect current subscriptions. 

A collection becomes more useful when the amount of archival 
content available online increases, especially if it is well indexed. 
When back files are included with the current subscription or basic 
service, the user has more articles to view, and this will affect usage. 

Interface affecting usage 

Barriers to access, such as requirements to register, have proved to be 
a major deterrent to use (Bishop 1998). Charging user fees also limits 
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access. Only in an unrestricted economic environment can demand 
be measured accurately. 

The presence of links that take the user directly to the full text of 
articles from the library's online catalog or from a Web list of journal 
holdings results in higher usage levels. Access data from Elsevier 
and MCB indicate that a high percentage of current use reflects the 
behavior of researchers who browse by a familiar journal title rather 
than that of general users who are searching for information on a 
subject. Tom Peters, director of the Center for Library Initiatives at 
the Committee for Institutional Cooperation (CIC), believes that ac- 
cessibility is one of the crucial and complex factors affecting use. 

The user's experience of the interface also will significantly af- 
fect the results. Both Academic Press and the American Institute of 
Physics (AIP) noted that they experienced surges in usage after they 
introduced new platforms that simplified navigation and access. 

Economic model 

As long as the journal, rather than the article, is the primary unit of 
sale, statistics will be collated by journal title. Academic Press pio- 
neered a site license for consortia that includes aU journals published 
by the Press and gives the user unlimited access to aU articles. In this 
model, titles that are highly used will have a lower cost per-use and 
be perceived as a better value. 

The trend toward offering a large database of journals from 
which the user selects articles gives rise to new economic models. 

The PEAK project, in which Elsevier titles were loaded at the Univer- 
sity of Michigan, allowed users to access, for a nominal additional 
cost to the library, articles from journals to which the university did 
not subscribe. 

Some librarians have begun to develop analyses based on article 
usage and cost per article. The hazard of pricing per-use is that value 
is associated with productivity of an article rather than with other 
measures. Pricing solely by usage may work with popular titles, but 
it ignores the importance of Uttle-used titles that have an impact on 
research. 

User privacy 

The topic of privacy applies both to data collected on individual us- 
ers and to data on libraries shared within consortia. 

Data that publishers currently provide on journal use do not re- 
veal specifics about any individual user, but present a summary of 
activity by journal title. However, publishers who offer personalized 
or customized services, such as e-mail alerts, must retain user-specif- 
ic information in order to deliver such services, and this requires that 
they establish policies about how they intend to use the data. 

Librarians have a tradition of protecting the privacy of users 
with policies regarding book circulation records. They are equally 
committed to protecting users' rights in the electronic environment. 
Publishers are considered responsible for how they use data they col- 
lect. Protecting the user's personal information is not just a courtesy: 
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it is a legal obligation (Rothman 2000). Publishers that collect such 
data need to develop policies for how it will be used throughout 
their organizations. Moreover, after a policy is established, it is essen- 
tial that the company monitor compliance internally. 

While the U.S. Federal Trade Commission is concerned that com- 
panies adhere to the privacy policies that the companies themselves 
have defined, there are more stringent requirements in Europe. Pub- 
lishers who sell electronic publications in Europe must have privacy 
policies that indicate what information is collected, how it is used, 
how the user can change it, with whom it is shared, and how users 
can opt out. 

For example, Elsevier's ScienceDirect product alerts users to the 
fact that they may be entering personal information when they take 
advantage of customized services or order documents. The compa- 
ny's privacy policy states that "Your information is kept confidential, 
unaltered and is used only by ScienceDirect and its parent company 
Reed Elsevier to administer your ScienceDirect /Elsevier Science rela- 
tionship." 

JSTOR is an industry leader in the area of statistics. Its privacy 
policy states that "No data are provided that would allow for the 
identification of the activity of individual users. Librarians can gen- 
erate reports only for their own institution's usage activity, although 
average usage activity at similar institutions is provided for compari- 
son purposes." The concern for privacy at some libraries extends to 
JSTOR's own statistics, and the library may want its usage data 
shared only with its permission. 

In its guidelines, ICOLC states that "Statistical reports or data 
that reveal confidential information about individual users must not 
be released or sold by information providers without permission of 
the consortium and its member libraries" (1998). 



Library Issues 

While usage statistics validate the library's investment, they also pro- 
vide insights into usage patterns that indicate the need to access a 
broader spectrum of titles than previously owned. This raises ques- 
tions about the approach to building collections on a "just-in-case" 
basis compared with new models that incorporate on-demand acqui- 
sition. 

Budget justification 

Reference librarians lament that students act as if a resource does not 
exist if it is not online. Declining book circulation and rapid growth 
in the use of electronic resources indicate that users are shifting from 
print to electronic resources. 

Libraries can tell which Web sites users are going to for informa- 
tion, but once users reach the publisher's site, their activity can be 
tracked only by the publisher. This means the library is dependent 
on the publisher to provide it with data vital for its internal reports. 
High usage demonstrates a good investment to administrators who 
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approve budget increases. For example, one library used statistics on 
after-hour use to show how the availability of electronic journals ex- 
tended the library's services. 

Impact on selection 

Recent data from OhioLINK show that more than half of the articles 
selected by users come from journals not currently held by the li- 
brary (Sanville 2000). There is increasing evidence from both libraries 
and publishers that current holdings are too limited to meet user de- 
mand, a trend that points to the benefits of user-driven selection pro- 
cedures. The emerging models for article selection from a database of 
electronic journals challenge libraries to restructure their approaches 
to collection development and create new models to meet their users' 
needs. 

Publisher Issues 

Publishers who have experience with their own usage statistics are 
becoming less worried about cancellations because they see that li- 
brarians are still processing the data, rather than reacting to it. Many 
publishers are still concerned, however, that because there is no con- 
text for most usage data, it can be misunderstood. 

Internal applications 

As publishers come to terms with the costs of developing their capa- 
bility to collect and analyze usage statistics, they find multiple appli- 
cations for such information internally. For example, the systems 
staff uses such data to budget for new hardware. The product-devel- 
opment staff analyzes how users access content. Marketing is inter- 
ested in how users find the site. The sales staff wants to know about 
the level of activity of their customers, and the editorial staff wants 
data on the most requested search terms. 

Quantitative Measures 



The establishment of accepted means for producing reliable and use- 
ful data can be viewed as a two-phase process. In phase one, pub- 
lishers reach agreement with each other and with librarians about 
what data are required and what standards should be adopted for 
collection and delivery. Once comparable data are available, it will be 
possible to analyze and draw conclusions from the data — phase two 
of the process. 

ICOLC guidelines were created to address a variety of files, in- 
cluding bibliographic databases, which are focused on simultaneous 
users. As a result, they reference turnaways and menu selections that 
do not apply to sitewide licenses for access to journal databases. 

There is the potential to learn a great deal about users and their 
behavior; however, at this early stage, experienced librarians agree 
that it is best to focus on only a few measures. OhioLINK Executive 
Director Tom SanvUle notes that of the criteria in the ICOLC guide- 
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lines, the only applicable measure of use of an electronic journal is the 
number of times an article is viewed, printed, e-mailed, or download- 
ed. 

There are three steps in processing the raw data that servers pro- 
duce on visits to the sites of Web-based journal collections. 

1. Data Collection: On the basis of the needs of internal and external 
users, each host site decides what data elements it will collect. For 
example, do downloads include both HTML pages viewed on the 
screen and PDFs downloaded? Sizable log files are reviewed to 
extract and summarize data. Rather than use locally developed 
software, systems staff often prefer commercially developed soft- 
ware because it usually offers more features, enhanced graphics, 
and customer support. 

2. Analysis: Decisions are made as to whether the analysis will be per- 
formed ahead of time on preselected fields or whether librarians 
can select the data elements, specify a time frame, and create their 
own reports. 

3. Presentation: Once the content of the report is determined, deci- 
sions must be made on the currency of the data (i.e., real-time or 
periodic uploads on a nightly, weekly, or monthly basis), whether 
the files can be exported, and whether the data are pushed to the 
library via e-mail or the library must retrieve them. 

What Are We Measuring? 

Once publishers agree upon basic data elements to be collected, ana- 
lyzed, and presented in a standard way, they will be able to produce 
the first generation of comparable statistics. Typically, what is being 
used (content), who is using it (user), and how the database is being 
used (activity) are measured. When the content is used and how the 
data will be presented are other questions of interest. 

What is being used? 

For a full-text journal database, the ICOLC guidelines define the use 
of articles as viewing, downloading, printing, or e-mailing the full 
text. Summaries of data usage by journal title can help librarians de- 
cide what titles to add, change, or delete and can assist publishers in 
determining the health of the journal. Comparing statistics on the 
abstracts and tables of contents viewed with statistics on downloads 
may provide insights on how users navigate the database. 

Who is using the content? 

Analysis by IP address range can sometimes reveal the academic de- 
partment that has requested the article and can be useful in assessing 
the need to train users or make them aware that a resource is avail- 
able. When users are remote and are assigned a dynamic IP address, 
it is difficult to determine the user's discipline. Some libraries at- 
tempt to combine usage data provided by the publisher with their 
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own data to determine the extent to which they are serving remote 
users. 

How is the database used? 

A "hit" registers each time the server receives a request to act (e.g., to 
do a search, to view an abstract, or to download an article). The type 
of hit can indicate how the user approaches the system — for exam- 
ple, to browse a journal title or to search for specific information. 
Searches include requests to search by title, author, or subject. Brows- 
ing includes accessing the full text from the journal title, issue, and 
table of contents. In menu-driven systems, the menu items that the 
user selects need to be counted. Direct access with a citation or URL 
may be counted separately. 

When is the content being used? 

Measures of activities (hits, sessions, downloads) are summarized by 
the hour, day, week, month, and year. The systems staff analyzes 
data from server logs to determine the ability of the server to meet 
the load during periods of peak demand. When systems provide ac- 
cess to simultaneous users, the number of times users are turned 
away also needs to be captured to measure unmet need. 

How will the data be presented? 

The degree to which statistics are useful to a library depends on how 
the data are presented. Librarians want to be able to do the following: 

• Query the system and specify the time period covered 

• Access two years of data online to monitor growth 

• Download data as a "comma delimited" file to load into a spread- 
sheet 

• Graph usage across years or titles or compare usage with that of 
other libraries 

• Access data that are real-time or that are updated nightly 

• Establish a profile and routinely receive the results by e-mail 

Once publishers have established a well-defined and consistent 
set of data, additional analysis will support exploration of data relat- 
ed to behavior and use and will attempt to address questions related 
to value. 



Data Reliability 

A standard methodology for collecting and analyzing data is neces- 
sary to ensure that both publishers and librarians have data that are 
comparable and reliable. 

With a full-text journal database, the conversation centers on 
three measures: hits (equated to searches), sessions (equated to us- 
ers), and documents used (equated to downloads). However, mea- 
suring hits or sessions can yield misleading information. The number 
of hits will vary, depending on network access and telecommunication 
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factors. Likewise, the number of sessions will vary because of time- 
outs and other network protocols. 

Conversations with the staff who implement the statistics func- 
tion revealed a common process of learning related to the design and 
development of internal processes to produce valid data. Any given 
method of implementation can produce varied results, based on the 
software selected and the diverse nature of local systems architecture. 

Caching: Caching allows frequently accessed Web pages to be 
stored on a server to improve performance. When users access 
cached materials, these actions are not counted as a hit on the host 
database. Consequently, for popular materials, statistics supplied by 
the host are likely to underestimate usage. 

Log files: Web server log files are a good means of helping admin- 
istrators gauge the demands on a server. Such logs measure requests 
for specific documents on a server, but they cannot show exact usage 
because caching is often employed and because users are assigned 
variable IP addresses (Bauer 2000). 

Although log files are not designed to describe how people use a 
site, they do allow analysis of the source of links into a Web site and 
therefore can be used to determine which sites are generating traffic. 
Such information can be useful to publishers. For example. Science 
News Online (SNO) learned that one of its articles, which had been 
cited on another Web site that was linked to the SNO Web site, had 
brought in a high volume of visitors to the SNO site. As a result, 

SNO decided to mount the full text of the cited article (Peterson 2000). 

Software: Bridget Pairaudeau, who handles the statistics function 
for Institute of Physics Publishing, noted that an off-the-shelf pack- 
age such as NetTracker can be used to screen out robot contamina- 
tion as well as data from internal testing, demonstrations, training 
sessions, and trials that skew the usage data. It was found that 
NetTracker records HTML articles viewed rather than PDFs down- 
loaded; this could be a concern for publishers that offer both func- 
tions. 

Summary and Next Steps 



The evolution of electronic journals to publisher-hosted databases of 
journal articles shifts the burden of measuring use from libraries to 
publishers. Although publishers need to collect data for their own 
purposes, the associated costs are considerable. Additional work is 
required to produce data that are meaningful and useful for librarians. 

Fears of conflicting motivations between publishers and librari- 
ans are diminishing as publishers become familiar with their own 
data and focus on the challenge of producing useful statistics. Con- 
cerns about comparability are valid and need to be addressed in a 
meeting where publishers who have already implemented statistical 
functionality can share what they have learned. Issues to be dis- 
cussed at such a meeting would include producing useful data and 
interpreting the data. 
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Producing usefu! data 

Publishers and librarians share concerns about the lack of standards 
for collecting and presenting data and the lack of context for draw- 
ing conclusions. Putting together a complete picture of use with data 
in multiple formats and from multiple sources is an additional chal- 
lenge for both. 

Given the variety of platforms and software packages, publish- 
ers need to learn from each other about the variables and to agree 
upon an approach that will produce consistent measures of use. If a 
group of individuals involved in producing statistics were to pool 
their intelligence and produce guidelines, it would greatly advance 
the state of the art. Once a critical mass of publishers is producing 
consistent data, it will be clear to others who are just beginning their 
work what data are needed and how to collect and deliver valid 
data. 

The industry is at the first stage of creating the capability to gath- 
er statistics, establish standards, and deliver comparable and reliable 
data. In another year, new systems will emerge that rely upon data 
mining and analysis and that focus on understanding user behavior. 

interpreting the data 

Users want and need access to a much broader range of material 
than that which can be owned affordably in print. Emerging pricing 
models and consortial arrangements that provide users with access 
result in data that show higher levels of use of nonowned titles. In- 
terpretations of these data vary; for example, they include concerns 
that the right titles are not being bought as well as the recognition 
that the information industry is moving from a supply-driven model, 
with preselected packages of information, to a demand-driven mod- 
el, where users choose what they need from a wide array of options. 
Making users aware of what is available and increasing the ease of 
access will require cooperation between publishers and librarians. 

Librarians and publishers need to understand users and their 
information-seeking behaviors in ways that were not previously pos- 
sible or necessary. As intermediaries between the author and the 
reader, publishers and librarians must learn how best to serve their 
users. Doing so will require further analysis. 

Recommendations for next steps 

Publishers are discovering what data they need to provide and how 
to provide it. There is no forum where staff working on statistics can 
share their understanding of the technology and make it easier for 
those who have just begun to tackle these issues. To facilitate the de- 
velopment of statistics in the industry, an organization such as the 
Council on Library and Information Resources might wish to spon- 
sor an invitational meeting that would' enable those involved in this 
area to discuss the issues they have encountered and to explore the 
development of guidelines for aU participants. 

Associations involved in creating standards and guidelines on 
data collection are focused on defining the data elements and deter- 
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mining what is currently being done. No one is working directly 
with publishers who have developed data, understand the variables, 
and are in a position to provide guidance so that those producing 
data can be consistent in their implementation. CLIR is well posi- 
tioned to host such a meeting, which should include representatives 
from the publisher, vendor, and library communities. Preliminary 
feedback from publishers and aggregators has been favorable. The 
author welcomes additional input on the structure and desired re- 
sults of such a forum. 
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APPENDIX A: 

Interviews with Librarians and Publishers 



OhioLINK, Los Alamos National Labs (LANL), and the Florida Cen- 
ter for Library Automation (FCLA) all host journal databases. They 
were selected for inclusion in this white paper because they had to 
develop the same capabilities being requested of publishers. Villano- 
va University was included because it has closed stacks for its bound 
journals, which means that it has good measures of use. James Mul- 
lins, the university librarian at Villanova, was on the task force that 
created guidelines for the statistics that JSTOR delivers. 

Academic Press, Elsevier, MCB, and the Institute of Physics 
(10 P) host their own journals and have experience with collecting 
statistics. The American Institute of Physics (AIP) and Association 
for Computing Machinery (ACM) are in the process of developing 
this capability. 

JSTOR and Catchword both host content from a variety of pub- 
lishers. JSTOR was part of the initial discussions about Library re- 
quirements, while Catchword is further developing its statistics ca- 
pability. Like the library hosts, these providers have a standard 
platform that provides consistent data to enable comparisons. 



Libraries 

OhioLINK 

Because OhioLINK staff developed the statistics capability when 
they designed the overall system, its initial set-up costs are not readi- 
ly identifiable. Ongoing support is provided by two staff members 
who perform many other duties. 

In addition to issuing regular usage reports, OhioLINK has taken 
advantage of the opportunity to perform further assessment of us- 
age. According to Executive Director Tom Sanville, this assessment 
shows that, although every title in the database has been used, 40 
percent of the titles represent 80 percent of the downloaded articles, 
while another 40 percent of the titles received only 10 percent of the 
use (SanviUe 2000). This prompted David Kohl, director of the li- 
brary at the University of Cincinnati and a member of OhioLINK, to 
suggest that low usage might make the latter titles candidates for 
lower pricing (Kohl 2000). 

A surprising discovery is that more than half (58 percent) of the 
articles downloaded for all OhioLINK libraries were not held in print 
by the libraries (SanviUe 2000). In each institution, patrons make use 
of a much wider number of journals than those held in print. 

This finding speaks to limitations imposed by budgets on the 
selection process and the importance of letting the user choose from 
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a larger file of material. In a paper presented at Oxford 2000, David 
Kohl noted that presenting users with a database of journal articles 
allows them to drive the selection process in a way that is similar to 
the current practice under which vendors supply librarians with 
books on approval. 

Los Alamos National Labs 

The LANL Library has gone through three stages of development, 
according to Director Rick Luce. The data the library collects depend 
on how far it parses its log files. The first stage, which entailed pars- 
ing LHSJIX logs and "'beat code," cost $20,000 and required nominal 
staff support. The second stage, which involved producing static us- 
age data on the basis of scripted code, cost $50,000; one staff member 
performed this activity. The third stage, designed to enable the user 
to perform a query and export the results, may cost $250,000. Pro- 
gramming staff will be involved in doing the analysis. 

LANL has 3,500 electronic journals available to its users; of these, 
2,000 titles are loaded locally and 1,500 are accessed remotely. When 
LANL did a trial with Elsevier, all titles in the database were used 
and the participating libraries did not own the most-used titles. 



Apr 99 - Mar 00 
Electronic Journal Center Titles 
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Luce concludes that librarians do not know exactly what users need, 
confirming the discovery process in research and the learning curve 
in the electronic environment. 

LANL enables its users to connect to full text from links within 
secondary publications, from browsing selected titles, and from per- 
forming subject searches. It takes six months for users to discover, 
remember, and fully use a new service. Keys to success are to ensure 
that links are established, to allow sufficient ramp-up time, and to 
promote awareness. LANL has expanded its electronic holdings 
since 1995, and user satisfaction with library services has increased 
dramatically. 

Florida Center for Library Automation 

FCLA is the central agency that supports the online catalogs of the 10 
universities in Florida. Like OhioLINK and LANL, FCLA loads a 
number of full-text journal databases, for which it produces statistics 
locally as well as links to publishers' remote sites. 

FCLA would like to track the number of searches, the number of 
documents retrieved, and the number of requests denied. The num- 
ber of hits is not a valid indicator of use because there is no consis- 
tent way to measure them. The number of articles viewed by journal 
title is counted when the PDF is viewed. Reports on usage of full-text 
journals are updated nightly in a formatted report that the librarians 
can download. 

When users link to a publisher's database, they have effectively 
left their home system. The library can tell which database they 
linked to, but it cannot track actions taken on the publisher's Web 
site. Consequently, libraries must rely on publishers for usage data 
and then merge such information with their own local data. 

Villanova University 

Villanova University Library Director James Mullins noted that stu- 
dents today rely solely on electronic publications because of their 
ease of access and use; consequently, they have a limited view of the 
available content. 

Villanova can track the usage of its bound print journals because 
they are in closed stacks. Use of print journal collections was grow- 
ing until 1995, when electronic databases were made available to us- 
ers, who also began to access the Web. Since then, the library has 
seen a dramatic decUne in the use of print materials and a steady in- 
crease in the use of electronic resources. 

In an attempt to collect some data locally on student and faculty 
use of remote databases, VUlanova analyzed its log summaries, 
which show the total number of times a database is accessed. These 
data are put into a spreadsheet as a frame of reference along with 
vendor-supphed data and are compared with the prior year's totals. 
Assistant Director for Public Services Louise Green noted that train- 
ing usage should be counted separately so as not to skew the totals. 
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Publishers 

Elsevier 

Elsevier has at least two staff devoted to managing usage data from 
its ScienceDirect database installations. Most libraries subscribe to 
only a portion of the 1,170 titles in Elsevier's database; therefore, 
data on the use of nonsubscribed titles are helpful in considering the 
addition of electronic or print versions of a title. 

Although Elsevier is committed to providing as much informa- 
tion as the customer believes is useful, staff acknowledge that cus- 
tom reports are not economical to generate. The company can see the 
impact of marketing on journal usage, and it has a staff of account- 
development managers devoted to training librarians and users on 
the system. As the volume of articles used rises, the cost per use 
drops. 

To keep current in their field, researchers scan about a dozen 
journals regularly by browsing their tables of contents. This activity 
is reflected in how the database is used when researchers select a 
journal title from a list and then browse the tables of contents of vari- 
ous issues, rather than search by subject, author, or title. 

Elsevier has paid particular attention to global requirements for 
a privacy poUcy, which appears on a full page on the Web site for Sci- 
enceDirect. Some customized services, such as an e-mail address for 
an alerting service, cannot be provided if the user does not provide a 
minimal amount of personal information. To ensure privacy, all data 
on individual users are scrubbed at the organizational level before 
being processed and aggregated. 

Academic Press 

Academic Press found that the off-the-shelf software packages that 
summarize hits do not provide the data that libraries need. It is hir- 
ing a full-time statistician and measurement analyst to help address 
the issue. The company experienced a dramatic increase in usage 
when it introduced its new platform in the fall of 1999. 

Data gathering is complicated because Academic Press's journal 
database (IDEAL) is loaded on remote sites such as OhioLINK and 
OCLC, and Academic Press needs to combine data from several 
sources for a complete picture of usage of its own journals. Data are 
used internally by sales, accounting, and editorial staff to examine 
correlations and draw conclusions about the cost per-article for each 
institution. This allows the publisher to understand how the library 
might equate the cost per-article to a relevant measure indicating val- 
ue. 

In the print world, subscription revenues indicate the health of a 
journal. When that journal is part of a database, the equation changes 
completely, since some of the articles used were in previously non- 
subscribed titles. 

For every 1.5 log-ins to the database, one article is downloaded, 
and for every abstract viewed, there is one article downloaded. Aca- 
demic Press summarizes the total number of log-ins by journal and 
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of articles downloaded by journal each month for each institution 
and consortium. 

Chrysanne Lowe, director of online sales and marketing, noted 
that the journals that have the most articles downloaded are consid- 
ered the company's most successful titles. These are large journals 
with many articles. The list of journals in greatest demand changes 
when the number of articles downloaded is compared with the num- 
ber of articles published in the title. 

Philosophically, Academic Press is opposed to a business model 
in which charges increase with use because it discourages use. Aca- 
demic Press offers marketing support with promotional items and 
coordinates training with librarians and faculty members. 

MCB University Press 

In addition to the normal data on time-of-day activity that help it de- 
termine the load on systems, the system at MCB University Press 
tracks hits and sessions. To learn how users come to the site, MCB 
also analyzes the top referring sites, top browsers, top entry pages, 
and the most popular and least popular pages in the database. 

MCB University Press is interested in knowing which institu- 
tions generate the most requests and which articles and journals are 
most requested. How users search is also of interest; for that reason, 
data on the tables of contents, search pages, and browse pages are 
collected. 

Heavy use of the tables of contents through the browse functions 
indicates that many users know the title they wish to see. However, 
MCB discovered that the most-used titles at some institutions were 
the first titles in the alphabet. This indicates that users are learning 
how to use a system and suggests the need to evaluate the interface 
or provide more training. 

institute of Physics 

Bridget Pairaudeau, producer of electronic publications at lOP, just 
completed the design of lOP's statistics form for internal use. It al- 
lows staff to select the following variables: 

• Who: user files and the subscription records from lOFs internal 
systems 

• What: data from log files on the type of activity and tim.e frame 

• View: display options, such as grouping subscribed journals 

Users of the lOP system also have the option of creating a graph 
by selecting elements for the x and y axes. If they chose to graph us- 
age of Web pages on both axes, they can show navigation to full text 
from the table of contents compared with navigation from the subject 
keyword search. Data on the use of options that can be customized, 
such as profiling, use of filing cabinets, and activating a table of con- 
tents alerting service, show which features are most used. 

The editorial and marketing staffs are interested in knowing 
which articles and journals are most requested and which institu- 
tions are most active. The sales department is interested in the level 
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of use by specific customers, and system designers want information 
they can use to enhance features, navigation, and usability. 

lOP screens out data on internal use, guests, free use, trials, pro- 
duction applications, and robot attacks, because they can greatly 
skew statistics. When lOP's internal data analysis did not match that 
of the commercial package, staff discovered that NetTracker counts 
HTML views but not PDF downloads. 

American Institute of Physics 

Doug LaFrenier, director of marketing at AIP, noted that the market 
has changed dramatically. Providing statistical data to libraries rep- 
resents a new set of responsibilities for publishers — one that has as- 
sociated costs. LaFrenier's primary concern is the lack of standards, 
which makes it impossible to compare data. 

AIP is concerned that it is undercounting because its system does 
not count searches and requests for abstracts. It counts only requests 
for the full text of an article that requires either a subscription or pay- 
per-view access. At the same time, AIP has discovered that one of the 
interfaces was triple counting downloads because of the way it 
grabbed the content. 

The American Institute of Physics, working with the American 
Physical Society (APS) has devoted much of one full-time program- 
mer's activity to developing Web-based statistics that libraries can 
access for their own use. The statistics, which will be available to oth- 
er publishers that AIP hosts, are planned for delivery early in 2001. 

AIP demonstrated the system at the Special Libraries Association 
2000 meeting. The demonstration showed year-to-date download 
statistics. Libraries who attended this session persuaded AIP that li- 
braries want to be able to specify their own time periods. They also 
want to be able to compare current data with information from prior 
years. AIP found it difficult to identify who within the library should 
have rights to view this information. 

Previously, AIP had given its own publishing customers reports 
from the server logs that summarize activity by journal title. The 
company also has analyzed time-of-day performance data to support 
decisions in running an online journal platform. It has been able to 
identify the most active journals and accounts, and believes that 
much of the information developed for online publishing customers 
will be useful in developing usage-statistics reports for libraries. 

Anyone using the AIP Web site has the option of buying an arti- 
cle online. Sales grew significantly when the company simplified its 
interface and reduced the number of steps required for the user to 
obtain the article. This further supports the importance of ease of in- 
terface on usage. 

Association for Computing Machinery 

The Association for Computing Machinery is evaluating what statis- 
tics need to be collected. As staff experimented internally with data, 
they found that the most frequently downloaded article in any given 
month was neither a current article nor one they would have expect- 
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ed to be so popular. High-use article titles provide clues for editors 
about the topics in demand. 



Providers 

JSTOR 

The ICOLC guidelines are based on those developed by a task force 
in conjunction with JSTOR in 1997. JSTOR data are updated nightly 
and can be queried and exported to a spreadsheet. Individual site 
data can be compared with average data for all sites in the same JS- 
TOR classification and with summary data for all JSTOR titles. Both 
publishers and librarians can sign on and retrieve data. 

Data presented include the number of pages viewed, PDFs print- 
ed, searches conducted, and tables of contents browsed. Since JSTOR 
includes as articles all items (e.g., reviews and letters), it lists full- 
length articles separately for clarity. 

In a presentation at the Conference on Economics and the Usage 
of Digital Library Collections, JSTOR President Kevin Guthrie ob- 
served that the articles that are most often downloaded are not those 
that advance research or that are most often cited (Guthrie 2000). 
"Value needs to be clearly defined as libraries consider acquisition 
and cancellation decisions for electronic content," Guthrie stated. 
(Marthyn Borghuis from Elsevier noted that citations reflect author 
activity while usage reflects reader activity). 

The notion of perishability of content varies with the discipline. 
The average age of the most-used articles was also surprising: 13 
years in economics and 32 years in mathematics. When there are a 
small number of total accesses for the discipline, the actions of a few 
people can sway the results. 

Guthrie cautioned that usage does not necessarily equate to val- 
ue in the research sense. "Older articles may be absolutely vital to 
the continuation of high-quality scholarship and research in the field, 
but that may not lead to extensive use," he said. 

Catchword 

Catchword delivers service that is paid for by the publishers, who 
decide what information to share with libraries. Catchword is ex- 
panding its statistics ability according to ICOLC guidelines, and it 
will have data that can be used by libraries. Catchword has decided 
to add tumaway statistics that reflect the number of times a user at- 
tempts to access the full text of an article in a journal to which the 
library does not subscribe. Catchword can also track pay-per-view 
access. Although the company has a single source to produce these 
data, its challenge is to summarize data from 11 servers around the 
world. 
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APPENDIX B: 

ICOLC Guidelines 



Statistical Measures of Usage of Web-Based 
Indexed, Abstracted, and Full Text Resources 
(November 1998) 

1. Requirements 

Each use element defined below should be able to be delineated by 

the following subdivisions: 

• by specific database provider 

• by each institutionally defined set of IP addresses /locators to sub- 
net level 

• by total consortium 

• by special data element passed by subscriber (e.g., account or ID 
number) 

• by time period. Vendor's system should minimally report by 
month. For each month, each type of use should be reported by 
hour of the day, and vendor should maintain 24 months of histori- 
cal data. 

Use elements that must be provided are: 

• Number of queries /searches categorized as appropriate for the 
vendor's information. A search is intended to represent a unique 
intellectual inquiry. Typically a search is recorded each time a 
search form is sent/ submitted to the server. Subsequent activities 
to review or browse among the records retrieved or the process of 
isolating the correct single item desired do not represent addition- 
al searches, unless the parameter(s) defining the retrieval set is 
modified through resubmission of the search form, a combination 
of previous search set, or some other similar technique. 

• Number of menu selections categorized as appropriate to the ven- 
dor's system. If display of data is accomplished by browsing (use 
of menus), this measure must be provided (e.g., an electronic jour- 
nal site provides alphabetic and subject-based menu options in 
addition to a search form. The number of searches and the num- 
ber of alphabetic and subject menu selections should be tracked). 

• Number of sessions (logins), if relevant, must be provided as a 
measure of simultaneous use. It is not a substitute for either query 
or menu selection counts. 

• Number of turnaways, if relevant, as a contract limit (e.g., re- 
quests exceed simultaneous user limit). 

• Number of items examined (viewed, marked or selected, down- 
loaded, emailed, printed) to the extent these can be recorded and 
controlled by the server rather than the browser. 
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O Citations displayed for A&I databases 
O Full text displayed by title, ISSN with title listed or other title 
identifier as appropriate. 

1. Tables of Contents displayed 

2. Abstracts displayed 

3. Articles or essays, poems, chapters, etc., as appropriate 
viewed (e.g., ASCII or HTML) or downloaded (e.g., PDF, 
email) 

4. Other (e.g., image/ AV files, ads, reviews, etc., as appropri- 
ate) 

2. Privacy and user confidentiality 

Statistical reports or data that reveal confidential information about 
individual users must not be released or sold by information provid- 
ers without permission of the consortium and its member libraries. 

3 . Institutional or consortial corrfidentiality 

Providers do not have the right to release or sell statistical usage in- 
formation about specific institutions or the consortium without per- 
mission, except to the consortium administrators and member librar- 
ies. Use of institutional or consortium data as part of an aggregate 
grouping of similar institutions for purposes of comparison does not 
require prior permission as long as specific institutions or consortia 
are not identifiable. When required by contractual agreements, infor- 
mation providers may furnish institutional use data to the content 
publishers. 

4 . Comparative statistics 

Information providers should provide comparative statistics that 
give consortia a context in which to analyze statistics at the aggre- 
gate institutional (consortium member) level. For example, a group- 
ing for purposes of comparison should be compiled by the informa- 
tion provider (e.g., statistics from an anonymous selection of similar 
institutions), or it might be a grouping composed on demand (e.g., 
statistics from all campuses in a consortium, presented either anony- 
mously or not, as desired by the participating institutions). 

5 . Access / Delivery mechanisms / Report formats 

Access to statistical reports should be provided via web-based re- 
porting systems and be restricted by IP address or another form of 
security such as passwords. Institutions should be able to authorize 
access to their data by other institutions in the consortium if they de- 
sire. Information providers should maintain access to tabular statisti- 
cal data through their web site (updated monthly) which a partici- 
pant can access, aggregate and manipulate on demand. When 
appropriate, these data also should be available in flat files contain- 
ing specified data elements that can be downloaded and manipulat- 
ed locally. Information providers are also encouraged to present data 
as graphs and charts. 
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APPENDIX C: 

Related Industry Initiatives 



National Commission on Libraries and Information 
Science (NCLIS) 

http://www,nctis,gov/tibraries/lsp/statisthtml 

Denise Davis at the NCLIS has commissioned John Bertot and 
Charles McClure of Florida State University to undertake a project 
entitled the "2000 Internet Connectivity Study." The authors will 
measure the level of connectivity, public access, training support, 
and technology available for the staff and patrons of public libraries. 
Focusing primarily on aggregators of indexes that include full text, 
the study authors are gathering information on the ability of public 
libraries to report electronic database use. 

Association of Research Libraries (ARL) 

http://www,arl,org/stats/newmeas/newmeas,html 
Martha Kyrillidou manages ARL's New Measures Initiative, which 
includes E-Metrics, a major project that began in June 2000. E-Metrics 
focuses on the development of statistics and performance measures 
for the delivery of networked information resources and services. 
Twenty-three ARL member libraries are participating in a study led 
by Charles McClure and Wonsik (Jeff) Shim from the Information 
Management Use and Policy Institute at Florida State University. 

Scheduled for completion in December 2001, the E-metrics 
project has three phases. In the first phase, information will be gath- 
ered on ARL libraries' best practices in statistics, measures, process- 
es, and activities that pertain to networked resources and services. In 
the second stage, a methodology will be developed to assess the de- 
gree to which such data collection is possible and collected data are 
comparable among member libraries. In the third phase, a set of re- 
fined measures with data descriptions and guidelines for data collec- 
tion, analysis, and use will be proposed. A separate task force within 
the project wiU focus on vendors' statistics, i.e., the definition of data 
elements and terms, specific data that can be collected, and methods 
for reporting data to libraries. 

National information Standards Organization (NISO) 
http://www,niso. org/ 

Patricia Wand, director of the library at American University, and 
Denise Davis, director of statistics and surveys at the NCLIS, are 
leading the planning process for a review of the current standard on 
Library Statistics. This revision wiU address areas such as perfor- 
mance measures and the measurement of electronic services and re- 
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sources, which were not dealt with in the last review Formal discus- 
sions will begin at a workshop to be held in February 2001. 

EQUINOX 

http://equinox. dcu. ie./reports/piUst. htmt 
Funded by the European Commission, EQUINOX is designed to 
gain agreement on performance measures for the electronic library 
and develop an integrated software tool for use by European librari- 
ans. Building on earlier projects that focused on tools for book collec- 
tions (e.g., EQLIPSE, MINSTREL), EQUINOX will take the lead in 
developing electronic performance indicators. 

The indicators in this project are defined either in an Internation- 
al Standards Organization (ISO) document or by the project team. 
Team members use several methods to identify the percentage of a 
library's target population that is served and trained to use materi- 
als. These include on-site and remote sessions, downloads, cost per 
session and per download, level of workstation usage and number of 
rejected sessions, and percentage of the acquisitions budget spent on 
electronic resources. 

LibEcon 2000 

http://yswyv.Ubecon2000. org 

Funded by the Directorate General X (DG X) of the European Com- 
mission, this three-year project is nearing completion. It is focused on 
gathering consistent information about the libraries' development as 
information resources within European countries. The LibEcon 2000 
Web site was established to test and then generate an automatic 
means of collecting data from respondents in 29 countries. 

Project staff are working closely with the United Nations' Educa- 
tional, Scientific, and Cultural Organization (UNESCO); the Interna- 
tional Federation of Library Associations and Institutions (IFLA); the 
European Commission's central statistical agent (Eurostat); the Euro- 
pean Bureau of Library, Information, and Documentation Associa- 
tions (EBLIDA); and the appropriate committees of the ISO. 
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